Predicting end of utterance in multimodal and unimodal conditions

نویسندگان

Pashiera Barkhuysen

Emiel Krahmer

Marc Swerts

چکیده

In this paper, we describe a series of perception studies on uniand multimodal cues to end of utterance. Stimuli were fragments taken from a recorded interview session, consisting of the parts in which speakers provided answers. The answers varied in length and were presented without the preceding question of the interviewer. The subjects had to predict when the speaker would finish his turn, based on video material and/or auditory material. The experiment consisted of 3 conditions: in one condition, the stimuli were presented as they were recorded (both audio and vision), in the two remaining conditions stimuli were presented in only the auditory or the visual channel. Results show that the audiovisual condition evoked the fastest reaction times and the visual condition the slowest. Arguably, the combination of cues from different modalities function as complementary sources and might thus improve prediction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The interplay between the auditory and visual modality for end-of-utterance detection.

The existence of auditory cues such as intonation, rhythm, and pausing that facilitate end-of-utterance detection is by now well established. It has been argued repeatedly that speakers may also employ visual cues to indicate that they are at the end of their utterance. This raises at least two questions, which are addressed in the current paper. First, which modalities do speakers use for sign...

متن کامل

Using Weighted Distributions for Modeling‎ Skewed‎, ‎Multimodal and Truncated Data‎

When the observations reflect a multimodal‎, ‎asymmetric or truncated construction or a combination of them‎, ‎using usual unimodal and symmetric distributions leads to misleading results‎. ‎Therefore‎, ‎distributions with ability of modeling skewness‎, ‎multimodality and truncation have been in the core of interest in statistical literature‎, ‎always‎. ‎There are different methods to contract ...

متن کامل

Combining User Modeling and Machine Learning to Predict Users' Multimodal Integration Patterns

Temporal as well as semantic constraints on fusion are at the heart of multimodal system processing. The goal of the present work is to develop useradaptive temporal thresholds with improved performance characteristics over state-of-the-art fixed ones, which can be accomplished by leveraging both empirical user modeling and machine learning techniques to handle the large individual differences ...

متن کامل

Semi-supervised Multimodal Learning with Deep Generative Models

In recent years, deep neural networks are used mainly as discriminators of multimodal learning. We should have large amounts of labeled data for training them, but obtaining such data is difficult because it requires much labor to label inputs. Therefore, semi-supervised learning, which improves the discriminator performance using unlabeled data, is important. Among semi-supervised learning, me...

متن کامل

Integration of iconic gestures and speech in left superior temporal areas boosts speech comprehension under adverse listening conditions

Iconic gestures are spontaneous hand movements that illustrate certain contents of speech and, as such, are an important part of face-to-face communication. This experiment targets the brain bases of how iconic gestures and speech are integrated during comprehension. Areas of integration were identified on the basis of two classic properties of multimodal integration, bimodal enhancement and in...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Predicting end of utterance in multimodal and unimodal conditions

نویسندگان

چکیده

منابع مشابه

The interplay between the auditory and visual modality for end-of-utterance detection.

Using Weighted Distributions for Modeling‎ Skewed‎, ‎Multimodal and Truncated Data‎

Combining User Modeling and Machine Learning to Predict Users' Multimodal Integration Patterns

Semi-supervised Multimodal Learning with Deep Generative Models

Integration of iconic gestures and speech in left superior temporal areas boosts speech comprehension under adverse listening conditions

عنوان ژورنال:

اشتراک گذاری